-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make map column non-nullable when it's a key in another map. #9147
Conversation
Signed-off-by: Chong Gao <[email protected]>
case m: MapType => | ||
// if this is a key of another map. e.g.: map1(map2(int,int), int), the map2 is the map | ||
// key of map1, map2 should be non-nullable | ||
val isMapNullable = if (isMapKey) false else nullable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to set isMapKey instead of just going off of nullable? We set nullable to false specifically each place we set isMapKey to true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider 2 scenarios.
First scenario:
struct(
field(column_1, map(int, int), nullable = true),
field(column_1, map(int, int), nullable = false)
)
We should honor the nullable in the else branch
if (isMapKey) false else nullable
Second scenario:
struct(
field(column_1, map1(map2(int, int), int), nullable = true)
)
The isMapKey
means this map is a map key in another map?
.
The isMapKey
here means the m: MapType
is a map key of another map, and its type is also Map.
We can not know isMapKey
unless that the outer writerOptionsFromField
passes it in.
Consider the map1(map2(int, int), int), the logic is from outer to inner.
writerOptionsFromField first handles map1, then handle its (key(map2(int, int)), value), here isMapKey means (map2(int, int) is key of map1.
writerOptionsFromField handles map2(int, int) (the key of outer map1), isMapKey is true, isMapKey is passed in from outer writerOptionsFromField.
writerOptionsFromField handles int (the value of outer map1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you understand what I am saying. isMapKey
is only ever set to true on line 296. On line 295 nullable is also always set to false for this same method call.
nullable = false,
writeInt96, fieldMeta, parquetFieldIdWriteEnabled, isMapKey = true).build()
There is never a situation if the current code when
is (isMapKey) false else nullable
will return anything different from nullable
by itself. If it did, then it would be an error in the code, because all map keys, no matter the type, are required to never be nullable. There is not even an option in the Spark MapType to make them nullable, which is why line 295 is hard coded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your detailed review. Fixed.
And also tested succefully with rapidsai/cudf#14003.
Once this PR is approved, I'll merge rapidsai/cudf#14003 and this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
build |
2 similar comments
build |
build |
fixes #9129
Fix map column can not be non-nullable.
Depends on rapidsai/cudf#14003.